A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation

نویسندگان

  • Jun Sun
  • Min Zhang
  • Chew Lim Tan
چکیده

The tree sequence based translation model allows the violation of syntactic boundaries in a rule to capture non-syntactic phrases, where a tree sequence is a contiguous sequence of subtrees. This paper goes further to present a translation model based on non-contiguous tree sequence alignment, where a non-contiguous tree sequence is a sequence of sub-trees and gaps. Compared with the contiguous tree sequencebased model, the proposed model can well handle non-contiguous phrases with any large gaps by means of non-contiguous tree sequence alignment. An algorithm targeting the noncontiguous constituent decoding is also proposed. Experimental results on the NIST MT-05 Chinese-English translation task show that the proposed model statistically significantly outperforms the baseline systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Introducing Non-Syntactic Phrases into a Syntax-Based Machine Translation System

The dominance of traditional phrase-based statistical machine translation (SMT) models (Koehn, Och, and Marcu, 2003) has recently been challenged by the development and improvement of a number of newer translation models that explicity take into account the syntax of the sentences being translated. One simple approach to incorporating syntax is to limit the phrases learned by a standard SMT tra...

متن کامل

Tree-to-String Alignment Template for Statistical Machine Translation

We present a novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is linguistically syntaxbased because TATs are extracted automatically from word-aligned, source side p...

متن کامل

Meta-Structure Transformation Model for Statistical Machine Translation

We propose a novel syntax-based model for statistical machine translation in which meta-structure (MS) and meta-structure sequence (SMS) of a parse tree are defined. In this framework, a parse tree is decomposed into SMS to deal with the structure divergence and the alignment can be reconstructed at different levels of recombination of MS (RM). RM pairs extracted can perform the mapping between...

متن کامل

A Generalized Reordering Model for Phrase-Based Statistical Machine Translation

Phrase-based translation models are widely studied in statistical machine translation (SMT). However, the existing phrase-based translation models either can not deal with non-contiguous phrases or reorder phrases only by the rules without an effective reordering model. In this paper, we propose a generalized reordering model (GREM) for phrase-based statistical machine translation, which is not...

متن کامل

Non-Isomorphic Forest Pair Translation

This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009